Range-Efficient Counting of Distinct Elements in a Massive Data Stream

نویسندگان

  • A. Pavan
  • Srikanta Tirthapura
چکیده

Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider rangeefficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algorithm which yields an ( , δ)-approximation of F0, with the following time and space complexities (n is the size of the universe of the items): (1) The amortized processing time per interval is O(log 1 δ log n ). (2) The workspace used is O( 1 2 log 1 δ logn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef, Kumar and Sivakumar [Proceedings of the 13th ACM–SIAM Symposium on Discrete Algorithms (SODA), 2002, pp. 623–632], which requires O( 1 5 log 1 δ log n) processing time per item. This algorithm can also be used to compute the max-dominance norm of a stream of multiple signals and significantly improves upon the previous best time and space bounds by Cormode and Muthukrishnan [Proceedings of the 11th European Symposium on Algorithms (ESA), Lecture Notes in Comput. Sci. 2938, Springer, Berlin, 2003, pp. 148–160]. This algorithm also provides an efficient solution to the distinct summation problem, which arises during data aggregation in sensor networks [Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, ACM Press, New York, 2004, pp. 250–262, Proceedings of the 20th International Conference on Data Engineering (ICDE), 2004, pp. 449–460].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Range-Efficient Counting of Distinct Elements in a Massive Data

Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider range-efficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer, but an interval of integers. We present a randomized alg...

متن کامل

An Evaluation of Streaming Algorithms for Distinct Counting Over a Sliding Window

Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring. On a stream of elements, it is commonly needed to compute an aggregate over only the most recent elements, leading to the problem of distinct counting over a “sliding window” of the stream. We present a detailed...

متن کامل

Range Efficient Computation of F0 over Massive Data Streams

Efficient one-pass computation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F0 of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an ( , δ) approximation of F0, with the following ...

متن کامل

Counting Distinct Elements in a Data Stream

We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ǫ. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.

متن کامل

Counting distinct objects over sliding windows

Aggregation against distinct objects has been involved in many real applications with the presence of duplicates, including real-time monitoring moving objects. In this paper, we investigate the problem of counting distinct objects over sliding windows with arbitrary lengths. We present novel, time and space efficient, one scan algorithms to continuously maintain a sketch so that the counting c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Comput.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2007